首页> 外文OA文献 >Temporal Index Sharding for Space-time Efficiency in Archive Search
【2h】

Temporal Index Sharding for Space-time Efficiency in Archive Search

机译:档案搜索中时空效率的时态索引分割

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the Web, document collections, wikis, and so on. Typical approaches for efficient evaluation of these queries involve \emph{slicing} along the time-axis either the entire collection~\cite{253349}, or individual index lists~\cite{kberberi:sigir2007}. Both these methods are not satisfactory since they sacrifice compactness of index for processing efficiency making them either too big or, otherwise, too slow. We present a novel index organization scheme that \emph{shards} the index with \emph{zero increase in index size}, still minimizing the cost of reading index index entries during query processing. Based on the optimal sharding thus obtained, we develop practically efficient sharding that takes into account the different costs of random and sequential accesses. Our algorithm merges shards from the optimal solution carefully to allow for few extra sequential accesses while gaining significantly by reducing the random accesses. Finally, we empirically establish the effectiveness of our novel sharding scheme via detailed experiments over the edit history of the English version of Wikipedia between 2001-2005 ($\approx$ 700 GB) and an archive of the UK governmental web sites ($\approx$ 400 GB). Our results demonstrate the feasibility of faster time-travel query processing with no space overhead.
机译:将时间约束与关键字查询结合在一起的时间旅行查询在搜索时间推移内容的大规模档案(例如Web,文档集合,Wiki等)时非常有用。有效评估这些查询的典型方法包括沿时间轴\ emph {slicing}整个集合〜\ cite {253349}或单个索引列表〜\ cite {kberberi:sigir2007}。这两种方法都不令人满意,因为它们牺牲了指数的紧凑性来提高处理效率,从而使其太大或太慢。我们提出了一种新颖的索引组织方案,该方案将\ emph {shards}的索引\ emph {索引大小的零增加},在查询处理期间仍将读取索引索引条目的成本降至最低。基于由此获得的最佳分片,我们开发了一种考虑到随机访问和顺序访问的不同成本的实用有效分片。我们的算法会仔细合并最佳解决方案中的分片,以减少额外的顺序访问,同时通过减少随机访问来显着增加收益。最后,我们通过对2001年至2005年英语版本的Wikipedia($ \大约700 GB)的编辑历史以及英国政府网站的存档($ \ approx进行详细的实验),通过经验确定新的分片方案的有效性。 $ 400 GB)。我们的结果证明了在没有空间开销的情况下进行更快的时间旅行查询处理的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号